\chapter{Quick Start}
\label{chap:quick-start}

This chapter is designed to give you a whirlwind tour of HyperDex.  The first
section gives a high-level overview of a typical HyperDex deployment and
application.  Subsequent sections provide step-by-step instructions for how
HyperDex could be used to build store data for a hypothetical phone book
application.

\section{High-Level Overview}
\label{sec:quick-start:overview}

A HyperDex cluster consists of three types of components: clients, servers, and
coordinators.

Applications built on top of HyperDex are referred to as {\em clients}.  and
access HyperDex through various bindings for popular languages (e.g. C, C++,
Java, and Python) to issue operations such as \code{get}, \code{put},
\code{search}, \code{delete}, etc.

The HyperDex servers are responsible for storing the data in the system. You can
have a cluster with as few as just a single server, though typical installations
will have dozens to hundreds of servers. These servers store the data in memory
and on disk, and they can crash at any time. HyperDex can be configured to
tolerate as many failed nodes as desired.

The HyperDex coordinator maintains the entire cluster.  This involves making
sure that servers are up, as well as detecting, removing, and replacing failed
or slow nodes.  The coordinator maintains a critical data structure, the
hyperspace map, that establishes the mapping of data to servers.  Clients use
this map to locate the servers they need to contact, while servers use it to
perform object propagation and replication to achieve the application's desired
goals.

In this tutorial, we'll discuss how to get a HyperDex cluster up and running. In
particular, we'll create a simple space, insert objects into it, retrieve those
objects, and then perform queries over these objects.

\section{Starting the Coordinator}
\label{sec:quick-start:coordinator}

First, we need to start up a coordinator that will oversee the organization of
the hyperspace.

The following command starts and initializes the coordinator:

\begin{consolecode}
hyperdex coordinator -f -l 127.0.0.1 -p 1982
\end{consolecode}

The ``coordinator'' command spawns a new HyperDex coordinator process listening
on on the specified address.  All other servers are introduced to the HyperDex
cluster via this address, so it is important that it be accessible to all
servers.  For the purposes of this tutorial, binding to the loopback address
will suffice, but you will probably want to use a public IP address for a real
deployment

At this point, the coordinator is up and running, and we're ready to start up
additional nodes in our cluster.

\section{Starting HyperDex Daemons}
\label{sec:quick-start:daemon}

The HyperDex daemons are the workhorse processes that actually house the data in
the data store and respond to client requests. Let's start a daemon on the the
same machine as the coordinator by running this in another terminal window:

\begin{consolecode}
hyperdex daemon -f --listen=127.0.0.1 --listen-port=2012 \
                   --coordinator=127.0.0.1 --coordinator-port=1982 --data=/path/to/data
\end{consolecode}

This command runs HyperDex in the foreground, listens for incoming connections
on address 127.0.0.1:2012 and connects to the coordinator that we started
earlier.  This enables the daemon to announce its presence, which in turn
enables the coordinator to assign partitions of data to the newly arrived
daemon.  For example, the coordinator can direct the daemon to take over some of
the partitions from an existing daemon, participate in the data replication
protocol, and start handling client requests.  But since we have no data yet and
this is our first node, this particular daemons does not have much work to do.

Note that the second argument (127.0.0.1) is the IP address to which we want to
bind the daemon.  Once again, we are using a loopback address here for the
purposes of the tutorial, while in real life you will undoubtedly want to use
one of the IP addresses bound to your internal network.  If no address is
specified, HyperDex will attempt to autodetect an external address.

The last argument is a pointer to a directory where the daemon will store all
data.  Each HyperDex daemon must have its own data directory.

It doesn't hurt to start a few more daemon instances at this point (although it
is not required to continue with the tutorial).

You now have a functional HyperDex cluster.  It's time to do something with it.

\section{Creating a new Space}
\label{sec:quick-start:space}

Let's imagine that we are building a phone book application.  Such a phonebook
would end up keeping track of people's first name, last name, and phone number.
And to distinguish unique users, it might assign a user id to each user.  In
HyperDex, each such collection of data is called a {\em space} and is
conceptually similar to {\em tables} in traditional databases.  Let's see how we
can instruct HyperDex to create a suitable space for holding such objects.

First, let's open a Python shell and connect to HyperDex via the Admin interface:

\begin{pythoncode}
>>> import hyperdex.admin
>>> a = hyperdex.admin.Admin('127.0.0.1', 1982)
\end{pythoncode}

This line instructs the admin bindings to talk to the coordinator and get the
current cluster configuration.  There is no need for static configuration files.
Clients always receive the most up-to-date configuration, and if the
configuration changes, say, due to failures, the coordinator will detect that a
client is operating with an out-of-date configuration and instruct it to retry
with a fresh config.  The only thing a client must know to connect to a cluster
is the address of the cluster coordinator.

Let's programmatically create a new space using the Python bindings:

\begin{pythoncode}
>>> a.add_space('''
... space phonebook
... key username
... attributes first, last, int phone
... subspace first, last, phone
... create 8 partitions
... tolerate 2 failures
... ''')
True
\end{pythoncode}

This command creates a new space called ``phonebook'' that has a key of
``username'', and has attributes ``first'', ``last'', and ``phone''.  By
default, HyperDex treats every attribute as an opaque bytestring, but provides
other types as well.  Here, we specify that the phone number be treated as an
integer.  The available datatypes are discussed in
Chapter~\ref{chap:data-types}.

Even though the objects we specify will have many attributes, HyperDex will not
necessarily create one giant hyperdimensional space spanning as many dimensions
as attributes.  Doing so would not be desirable, as the space would be to large
to efficiently map onto a cluster.  Instead, we will typically want to create
{\em subspaces} consisting of smaller numbers of dimensions. The lower number of
dimensions enable the mapping from points in space to nodes in the cluster to be
more efficient; in particular, fewer nodes need to be contacted during search
operations.  In this simple example, we create a 3-dimensional subspace for the
{\em first}, {\em last} and {\em phone} attributes.  HyperDex always implicitly
creates a 1-dimensional subspace for the key of objects.

In other NoSQL systems, objects can {\em only} be retrieved efficiently by the
key under which they were inserted.  So an object <jsmith, John, Smith,
555-1234> can only be retrieved by its key ``jsmith''.  Subspaces enable
HyperDex to retrieve all ``John'' or ``Smith'' objects or, even, reverse lookups
by phone number.  The key serves as an object identifier so that objects may be
retrieved or stored efficiently.  Internally, the key is used to sequence
updates and ensure consistency.

Even though we've only deployed one server in this example, we may want to leave
room for future growth of our HyperDex cluster.  The ``create 8 partitions''
line specifies that HyperDex will partition the resulting space into 8 regions.
As a general rule, the number of partitions should be greater than the number of
daemons that will ever join the cluster.  It's perfectly acceptable to omit this
line, and HyperDex will partition the cluster into 256 regions.

Since large scale cloud-computing deployments are sure to encounter failures, we
will want to safeguard the data in our key-value store by replicating for fault
tolerance.  The ``phonebook'' space is configured to tolerate up to two
concurrent failures (``tolerate 2 failures'').  This instructs HyperDex to
protect against up to two failures by creating three replicas of each object.
Even if two daemons holding an object fail, there will still be one copy of the
object remaining.  HyperDex automatically repairs from this one remaining copy.

Finally, it's possible to create spaces using the command-line tools that ship
with HyperDex.  One could have created the ``phonebook'' space using the
command-line:

\begin{consolecode}
hyperdex add-space -h 127.0.0.1 -p 1982 << EOF
   space phonebook
   key username
   attributes first, last, int phone
   subspace first, last, phone
   create 8 partitions
   tolerate 2 failures
EOF
\end{consolecode}

\section{Interacting with the ``phonebook'' Space}
\label{sec:quick-start:phonebook}

Now that we have our hyperspace defined and ready to go, it's time to insert
some information into our phonebook.

Continuing in the same Python session, lets open a new Client connection to the
cluster.  HyperDex separates the metadata manipulation provided by the Admin
library from the data manipulation provided by the Client library.

\begin{pythoncode}
>>> import hyperdex.client
>>> c = hyperdex.client.Client('127.0.0.1', 1982)
\end{pythoncode}

We can create an object by doing the following:

\begin{pythoncode}
>>> c.put('phonebook', 'jsmith1', {'first': 'John', 'last': 'Smith',
...                                'phone': 6075551024})
True
\end{pythoncode}

This operation will determine the right spot in the hyperspace for this object
and issue the \code{put} operation to all replicas.  The operation will only
return once the object has been committed at all requisite nodes.

We can easily retrieve the same ``jsmith'' object by using a standard
\code{get}.

\begin{pythoncode}
>>> c.get('phonebook', 'jsmith1')
{'first': 'John', 'last': 'Smith', 'phone': 6075551024}
\end{pythoncode}

Yay!  We inserted an object under the key ``jsmith1'' and retrieved it using the
same key.  This looks exactly like every other NoSQL store out there, but there
are a few differences.

First, it's blazingly fast. You can look at our latest performance graphs for
the precise comparisons, but typically, HyperDex is just way faster than other
key-value stores.

Second, it's fault-tolerant. When we performed the \code{put}, our operation was
sent through a {\em value-dependent chain} of daemons assigned to the object.
The client received an acknowledgment only when the object was replicated on
every single server in the chain.  Unlike NoSQL stores that optimistically
assume that an update was committed before reaching all servers, HyperDex
responds only when all daemons have been updated.  And we can pick the
replication level to achieve any level of fault tolerance we desire.

Finally, it's consistent. If we had multiple concurrent \code{put} operations
being issued by multiple clients at the same time, we would never see an
inconsistent state.  What is an inconsistent state?  It's what you get when you
settle for {\em eventual consistency}.  For instance, we would not want a
prescription tracking system to say that we dispensed a drug, then to say we did
not, only to settle on (say) having dispensed it. Yet this is precisely what
might happen with an eventually consistent NoSQL key-value store.  Eventual
consistency is no consistency at all.  In contrast, HyperDex provides
linearizability. Time will never roll backwards from the point of any client.

And it gets better. Not only can we retrieve objects by their key, but we can
also retrieve them when we don't know their key. Here are some examples:

\begin{pythoncode}
>>> [x for x in c.search('phonebook', {'first': 'John'})]
[{'first': 'John',
  'last': 'Smith',
  'phone': 6075551024,
  'username': 'jsmith1'}]
>>> [x for x in c.search('phonebook', {'last': 'Smith'})]
[{'first': 'John',
  'last': 'Smith',
  'phone': 6075551024,
  'username': 'jsmith1'}]
\end{pythoncode}

Let's do that reverse phone number search:

\begin{pythoncode}
>>> [x for x in c.search('phonebook', {'phone': 6075551024})]
[{'first': 'John',
  'last': 'Smith',
  'phone': 6075551024,
  'username': 'jsmith1'}]
\end{pythoncode}

Here's a fully-qualified search. Hyperspace hashing makes this nearly as fast as
a key-based lookup:

\begin{pythoncode}
>>> [x for x in c.search('phonebook',
...  {'first': 'John', 'last': 'Smith', 'phone': 6075551024})]
[{'first': 'John',
  'last': 'Smith',
  'phone': 6075551024,
  'username': 'jsmith1'}]
\end{pythoncode}

Let's add another user named ``John Doe'':

\begin{pythoncode}
>>> c.put('phonebook', 'jd', {'first': 'John', 'last': 'Doe', 'phone': 6075557878})
True
>>> [x for x in c.search('phonebook',
...  {'first': 'John', 'last': 'Smith', 'phone': 6075551024})]
[{'first': 'John',
  'last': 'Smith',
  'phone': 6075551024,
  'username': 'jsmith1'}]
>>> [x for x in c.search('phonebook', {'first': 'John'})]
[{'first': 'John',
  'last': 'Smith',
  'phone': 6075551024,
  'username': 'jsmith1'},
 {'first': 'John',
  'last': 'Doe',
  'phone': 6075557878,
  'username': 'jd'}]
>>> [x for x in c.search('phonebook', {'last': 'Smith'})]
[{'first': 'John',
  'last': 'Smith',
  'phone': 6075551024,
  'username': 'jsmith1'}]
>>> [x for x in c.search('phonebook', {'last': 'Doe'})]
[{'first': 'John',
  'last': 'Doe',
  'phone': 6075557878,
  'username': 'jd'}]
\end{pythoncode}

Should John Doe decide he no longer wants to be listed in the phonebook, it's
trivial to remove his listing:

\begin{pythoncode}
>>> c.delete('phonebook', 'jd')
True
>>> [x for x in c.search('phonebook', {'first': 'John'})]
[{'first': 'John',
  'last': 'Smith',
  'phone': 6075551024,
  'username': 'jsmith1'}]
\end{pythoncode}

Suppose John Smith needs to change his phone number. This is easily accomplished
by specifying just the key for the object and the changed attribute.  All other
attributes will be preserved.

\begin{pythoncode}
>>> c.put('phonebook', 'jsmith1', {'phone': 6075552048})
True
>>> c.get('phonebook', 'jsmith1')
{'first': 'John',
  'last': 'Smith',
  'phone': 6075552048}
\end{pythoncode}

Smith is a popular name.  Let's say there was ``John Smith'' from Rochester
(area code 585):

\begin{pythoncode}
>>> c.put('phonebook', 'jsmith2',
...          {'first': 'John', 'last': 'Smith', 'phone': 5855552048})
True
>>> c.get('phonebook', 'jsmith2')
{'first': 'John',
  'last': 'Smith',
  'phone': 5855552048}
\end{pythoncode}

Suppose we want to locate everyone named ``John Smith'' from Ithaca (area code
607). We can do this with a range query in HyperDex.

\begin{pythoncode}
   >>> [x for x in c.search('phonebook',
   ...  {'last': 'Smith', 'phone': (6070000000, 6080000000)})]
   [{'first': 'John',
     'last': 'Smith',
     'phone': 6075552048,
     'username': 'jsmith1'}]
\end{pythoncode}

Or perhaps we want to search for everyone whose name falls between ``'Jack'``
and ``'Joseph'``:

\begin{pythoncode}
>>> [x for x in c.search('phonebook',
...  {'first': ('Jack', 'Joseph')})]
[{'first': 'John',
  'last': 'Smith',
  'phone': 6075552048,
  'username': 'jsmith1'},
 {'first': 'John',
  'last': 'Smith',
  'phone': 5855552048,
  'username': 'jsmith2'}]
\end{pythoncode}

\section{Cleaning Up}
\label{sec:quick-start:cleanup}

When we're done with the ``phonebook'' space, we can delete the space and all
the items in it directly from Python:

\begin{pythoncode}
>>> a.rm_space('phonebook')
True
\end{pythoncode}
